Econometrics 101 - Formula Reference Sheet

Quick Method Selection Guide

Method	When to Use	Key Assumption	Estimate
OLS	Selection on observables	E[ε\|X] = 0	ATE (if CIA holds)
IV/2SLS	Endogenous treatment	Exclusion restriction	LATE for compliers
DiD	Policy changes over time	Parallel trends	ATT for treated units
RDD	Assignment by threshold	No manipulation	LATE at cutoff
Fixed Effects	Panel data, unobserved heterogeneity	Strict exogeneity	Within estimator

Basic Regression & OLS

Population Model

Y = β₀ + β₁X₁ + β₂X₂ + ... + βₖXₖ + ε

β₁ = marginal effect of X₁ on Y, holding other X's constant

OLS Estimator

β̂ = (X'X)⁻¹X'Y

Minimizes sum of squared residuals

Standard Errors

SE(β̂) = √diag{σ²(X'X)⁻¹}

where σ² = RSS/(n-k-1)

t-statistic

t = β̂/SE(β̂)

H₀: β = 0, reject if |t| > 1.96 (α = 0.05)

OLS Assumptions (Gauss-Markov)

A1: Linear in parameters
A2: Random sampling
A3: No perfect collinearity
A4: Zero conditional mean: E[ε|X] = 0
A5: Homoskedasticity: Var(ε|X) = σ²
A6: Normality: ε|X ~ N(0, σ²) [for inference]

Robust SEs (Heteroskedasticity)

Stata: reg y x, robust
R: lm_robust(y ~ x, data)

Clustered SEs

Stata: reg y x, cluster(id)
R: lm_robust(y ~ x, clusters = id)

Instrumental Variables (IV/2SLS)

Structural Equation

Y = β₀ + β₁X + β₂Z + ε

X is endogenous, Z are exogenous controls

First Stage

X = π₀ + π₁IV + π₂Z + v

IV predicts endogenous variable X

Reduced Form

Y = γ₀ + γ₁IV + γ₂Z + u

Total effect of IV on outcome

IV Estimator (Wald)

β̂₁ᴵⱽ = γ₁/π₁ = Cov(Y,IV)/Cov(X,IV)

Ratio of reduced form to first stage

IV Assumptions

Relevance: Cov(IV, X) ≠ 0 (first stage F > 10)
Exclusion: Cov(IV, ε) = 0 (only affects Y through X)
Monotonicity: Same direction response for all units
Independence: IV uncorrelated with unobservables

Key Diagnostic Tests

Weak IV Test: F-stat in first stage > 10 (Stock-Yogo critical values)
Overidentification: Hansen J-test (p > 0.05 for valid instruments)
Endogeneity Test: Hausman test (compare OLS vs IV)

Stata: ivregress 2sls y (x = iv) z, first
R: ivreg(y ~ x + z | iv + z, data)

Difference-in-Differences (DiD)

DiD Estimator

δ̂ = (Ȳ₁ᵀ - Ȳ₀ᵀ) - (Ȳ₁ᶜ - Ȳ₀ᶜ)

Treatment effect = difference in differences

Regression Specification

Y_{it} = β₀ + β₁Treat_i + β₂Post_t + β₃(Treat×Post) + X_{it} + ε_{it}

β₃ is the DiD estimate

With Fixed Effects

Y_{it} = α_i + λ_t + β(Treat×Post) + X_{it} + ε_{it}

α_i = unit FE, λ_t = time FE

Event Study

Y_{it} = α_i + λ_t + Σₖ βₖD_{i,t+k} + X_{it} + ε_{it}

βₖ = effect k periods relative to treatment

DiD Assumptions

Parallel Trends: E[Y₁ᶜ - Y₀ᶜ | X] = E[Y₁ᵀ - Y₀ᵀ | X, D=0]
No Anticipation: Treatment doesn't affect pre-treatment outcomes
SUTVA: No spillovers between units
Common Shocks: Time effects same for treated/control

Validity Tests

Pre-Trends Test: Test β_{-2} = β_{-1} = 0 in event study
Placebo Test: No effect on unaffected outcomes
Balance Test: Pre-treatment characteristics similar

Stata: reghdfe y treat##post, absorb(id time)
R: feols(y ~ treat:post | id + time, data)

Regression Discontinuity (RDD)

Sharp RDD

Y_i = α + βT_i + f(X_i - c) + ε_i

T_i = 1 if X_i ≥ c, β = treatment effect

Fuzzy RDD (First Stage)

D_i = γ + δT_i + g(X_i - c) + v_i

D_i = actual treatment, δ = compliance rate

Local Linear Estimation

Y_i = α + β·1{X_i ≥ c} + γ(X_i - c) + δ(X_i - c)·1{X_i ≥ c} + ε_i

Linear slopes on each side of cutoff

Optimal Bandwidth

h* = C_K[(2σ²)/(f(c)·m₂²)]^(1/5)·n^(-1/5)

Imbens-Kalyanaraman bandwidth

RDD Assumptions

No Manipulation: Smooth density of running variable at cutoff
Continuity: E[Y₀|X] continuous at cutoff
Local Randomization: As-good-as-random near cutoff

Validity Tests

McCrary Test: ln(f₊/f₋) normally distributed
Covariate Balance: No jumps in pre-treatment variables
Bandwidth Sensitivity: Results stable across bandwidths
Placebo Cutoffs: No effects at fake thresholds

Stata: rdrobust y x, c(cutoff)
R: rdrobust(y, x, c = cutoff)

Panel Data & Fixed Effects

One-Way Fixed Effects

Y_{it} = α_i + βX_{it} + ε_{it}

α_i = individual-specific effect

Two-Way Fixed Effects

Y_{it} = α_i + λ_t + βX_{it} + ε_{it}

Controls individual + time effects

Within Estimator

β̂_{FE} = [Σᵢ Σₜ (X_{it} - X̄_i)(X_{it} - X̄_i)']⁻¹ Σᵢ Σₜ (X_{it} - X̄_i)(Y_{it} - Ȳ_i)

Uses within-unit variation only

Random Effects

Y_{it} = β₀ + βX_{it} + α_i + ε_{it}

α_i ~ N(0, σ²_α), uncorrelated with X

Fixed Effects Assumptions

Strict Exogeneity: E[ε_{it}|X_i, α_i] = 0 for all t
No Time-Varying Confounders: Other factors controlled by time FE
Sufficient Variation: X_{it} varies within units over time

Panel Data Tests

Hausman Test: H₀: Random effects consistent (prefer if p > 0.05)
F-test for FE: H₀: α_i = α for all i
Serial Correlation: Wooldridge test for AR(1)

Stata: xtreg y x, fe
R: feols(y ~ x | id + year, data)

Standard Error Types & When to Use

SE Type	When to Use	Stata Command	R Command
Classical	Homoskedastic errors	reg y x	lm(y ~ x)
Robust	Heteroskedasticity	reg y x, robust	lm_robust(y ~ x)
Clustered	Within-cluster correlation	reg y x, cluster(id)	lm_robust(y ~ x, clusters = id)
Bootstrap	Non-standard distributions	bootstrap: reg y x	boot package
Panel Robust	Panel heteroskedasticity	xtreg y x, fe robust	feols(y ~ x \| id, vcov = "hetero")

Critical Values & Rules of Thumb

t-test: |t| > 1.96 for significance at α = 0.05
F-test (weak IV): F > 10 (Stock-Yogo), F > 104 for 10% bias
R²: 0.02 small, 0.13 medium, 0.26 large effect sizes
DW test: 1.5 < DW < 2.5 suggests no serial correlation
VIF: VIF > 10 suggests multicollinearity problems
Cook's D: > 4/n suggests influential observations

Econometrics 101 Formula Reference Sheet • ImpactMojo Knowledge Series
Licensed under CC BY-NC-ND 4.0 • Free to use with attribution • www.impactmojo.in